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ABSTRACT 


The aim of this thesis was to determine the feasibility of identifying a deviee 
eonneeted to the Internet through multiple interfaces (i.e., multi-homed) using only the 
information provided by passively observing network traffic. Since multi-homed hosts 
allow an alternate means for outside entities to circumvent the security of a firewall and 
gain access to a network, it is important for a network’s security to be able to detect and 
remove such devices. In this work, the idea of using clock skew—^which is the difference 
in perceived time between two system clocks—as a unique signature is utilized to 
identify hosts on a network that are potentially multi-homed. Testing was done on a 
software-defined network that contained a multi-homed host. After traffic between hosts 
was collected and analyzed, analysis of the confidence intervals of the device’s clock 
skew was conducted to determine if IP addresses originating from the same host could be 
successfully detected solely from network traffic. Testing confirmed that the proposed 
scheme provided a valid means of detecting a multi-homed device on a network. This 
scheme was repeated on multiple hosts and on a device with multiple connections to the 
network. 
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I. 


INTRODUCTION 


Network security remains a major concern for all communications systems. With 
the advent of panoptic, or comprehensive, network management techniques such as 
software-defined networking (SDN), the ability of a system administrator to leverage the 
monitoring functions of a panoptic controller have led to the development of a large 
range of applications for network control and security to include monitoring applications 
for maintaining the security and integrity of one’s network [1], 

A. BACKGROUND AND MOTIVATION 

A variety of security and cyber related concerns exist for any network. Before an 
attack can be conducted on a network, an attacker must first gain access. One method to 
prevent this is the use of a firewall between a private network and the Internet. A 
potential security flaw in a network is the existence of a multi-homed host [2]-[4]. 
Through the use of multiple interfaces on a host, the security of a network and the 
integrity of its firewall can be circumvented. 

A multi-homed host is a device connected to the Internet through multiple 
interfaces [2]-[4]. If one of these connections is to a private network and the other to the 
open Internet, this provides a possible access vector that bypasses the network’s firewall 
[4]. This threat calls for the need to be able to detect if a multi-homed host exists on a 
network and is the motivation behind this research. 

B. THESIS OBJECTIVES AND APPROACH 

The goal of this thesis is to develop a scheme for detecting multi-homed hosts in a 
panoptic network such as a SDN. A framework for an application that can be used to 
detect hosts using multiple interfaces that are independent of their Internet Protocol (IP) 
or Media Access Control (MAC) address is provided in this thesis. 

The objective of this thesis is to identify techniques and monitoring schemes that 
can be used to increase the security of a network. In this thesis, we investigate the use of 
the clock skew of a host compared to a designated fingerprinter as a unique identifier. If a 
unique clock skew correlates to two or more unique IP addresses on the network, this 
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represents a possible multi-homed deviee. In this work, analyses are eondueted based on 
the eonfidenee intervals of the ealeulated eloek skews to determine if two similar eloek 
skews represent the same, multi-homed host. 

C. RELATED WORK 

The idea for using the eloek skew of a host for remote physieal deviee 
fingerprinting was first suggested in [5]. It was shown that modem eomputer ehips had 
deteetable and distinguishable eloek skews that eould be ealeulated by observing the 
Transmission Control Protoeol (TCP) timestamps from traffie on the network. It was then 
verified that the eloek skew of a deviee remained eonstant even when using separate 
Ethernet and Wi-Fi interfaees originating from the same deviee [6]. 

This idea was further used as an enumeration tool in [7]. Researehers used eloek 
skews of a deviee to determine the number of hosts aetive behind a network address 
terminal (NAT). This was aeeomplished by eounting the number of unique eloeks skews 
eneountered from traffie exiting a NAT and eorrelating them to unique deviees [7]. 

In this thesis, these ideas are expanded upon, and they are used to deteet multi¬ 
homed deviees aetive on a SDN. Sinee the eloek skew of a deviee is eonstant and 
independent of the interfaee used, it ean be used as a fingerprint for a multi-homed 
deviee. We also eonduet the eonfidenee interval analysis of the eloek skew data 
eneountered on the network to identify deviees that appear to be separate based on IP 
address but are originating from the same deviee. 

D. THESIS ORGANIZATION 

The remainder of this thesis is organized as follows. In Chapter II, the seeurity 
threats posed by a multi-homed host, the arehiteeture and routing proeedures within a 
SDN, and the system eloek and its unique properties are introdueed. The proposed 
seheme for multi-homed deviee deteetion is deseribed in detail in Chapter III, while the 
results of the experiment are eontained in Chapter IV. A deseription of the network that 
was used to test the feasibility of using eloek skews to deteet the presenee of a multi¬ 
homed host is ineluded. Finally, the thesis is eoneluded in Chapter V, where signifieant 
results and reeommendations for future work are presented. 
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II. BACKGROUND 


Network security continues to be a vital concern for a constantly connected 
society. One such concern is the access afforded to a network via a multi-homed host [2], 
Mitigating the threat on a SDN by detecting such a device is the focus of this research. 
Before proposing the detection scheme for such a device, the relevant background 
information is presented in this chapter to introduce the threats and tools that are used to 
mitigate them. First, the basics of a multi-homed host and how such a device can be used 
to bypass a network’s security are discussed. Then, the architecture and routing 
procedures of a SDN are presented. The system clock of a network device is discussed, 
and how it can to be used as a unique identifier is presented. Lastly, the concepts of 
confidence intervals and their role in hypothesis testing are described. 

A. MULTI-HOMED HOST 

A multi-homed host is one that has multiple connections to a network or 
networks. This can be accomplished by having multiple network interface cards (NICs) 
installed in the same host, which provides a host with multiple MAC and IP addresses 
[3]. Multi-homed hosts are used in a network for redundancy purposes [2]. With a multi¬ 
homed host on a network, the reliability of a network’s access can be increased. Access 
node failure can be mitigated, and the connectivity from an Internet service provider 
(ISP) can be made more reliable by having separate connections to separate ISPs [8]. 

1. Security Threats with Multi-Homed Hosts 

The threat from a multi-homed host comes from the fact that a multi-homed host 
can be used to bypass the firewall between an internal network and the Internet [2]. 
Certain operating systems, such as Windows, were never meant to isolate two interfaces 
within a host and often integrate traffic from one to the other [2]. This results in the 
ability for an infection on one network to be passed to another. 

Closed networks are protected from the Internet by firewalls, which only allow 
designated traffic to flow between the two mediums. If a host is multi-homed, this allows 
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for the opportunity to bypass the firewall and provide access to a closed network [4]. 
Once access to a host on a closed network is gained, potential threats can map a network 
and begin an exploitation process or infect the network with malicious code. An example 
of such a network configuration is depicted in Figure 1. Throughout this research, the 
threat of a multi-homed host serving as an access vector to a network is to be mitigated 
by the ability to detect the presence of such a host on the network. 



Figure 1. A Network with a Multi-Homed Host Implemented to Bypass the 

Firewall to the Internet 

B. SOFTWARE-DEFINED NETWORKS 

A software-defined network is an innovative networking scheme in which the 
control and data planes within a network are logically separated. In a SDN, the routing 
functions for the network are controlled from a centralized location, known as the 
controller [1], [9], [10]. This centralized controller is able to view the operation of the 
entire network, allowing it to monitor and react to any potential hazards that may exist 
[ 10 ]. 


1. Architecture 

A SDN is divided into three planes that each interact to control the functionality 
of the network as shown in Figure 2. The lowest plane is the data plane, which consists of 
switches that forward packets based on flow rules [1]. Above the data plane is the control 
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plane. From here, network traffie is monitored, and the flow rules for designated paekets 
are determined [1], [9], [11]. The eontroller ean be programmed by applieations, allowing 
the network to dynamieally reaet to any ehanges within the network. This is done at the 
upper plane, known as the applieation plane of the network [1], [11]. 



Figure 2. Funetional Planes within a Software-Defined Network. Souree: [1]. 

2. Routing 

Routing within a SDN is eompleted using flow rules that are determined at the 
eontrol plane and stored at the data plane. Due to this funetionality, routing is now a rule- 
based proeess viee a destination-based proeess [1]. A SDN operates as a Transmission 
Control Protoeol/Internet Protoeol (TCP/IP) network and uses the OpenFlow protoeol for 
its rule-based routing. The OpenFlow protoeol matehes paekets to designated flow rules 
within a flow table at the data plane. If no sueh rule exists, the paeket is forwarded to the 
control plane where a decision is made as to how it should be routed. Once this 
determination is made, the packet is forwarded back to the data plane for routing along 
with updates for the flow tables for future routing decisions [10], [11]. 
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c. 


SYSTEM CLOCKS 


Networked deviees all have internal electrie elocks that are built from both 
hardware and software components. These clocks control all timing functions for the 
device [12], Within these electronic clocks, crystal oscillators are used to determine the 
clock signal and the rate at which the clock ticks [13]. These crystal oscillators each 
operate at different, unique frequencies due to the crystal type, the manufacturing 
parameters, and the small imperfections that are inherent to all manufacturing procedures 
[13], [14]. Due to these factors, clocks within a device operate at slightly different 
frequencies independent of clock type or manufacturing series [13]. This makes the 
system clock within a device a unique characteristic that can be exploited to identify that 
device. 


1. TCP Timestamps 

The TCP header consists of a standard 20 bytes of information followed by a 
portion of data allocated to options within the protocol [15], which is shown in Figure 3. 
In the options section of the header of a TCP packet is a field for the TCP timestamp. The 
TCP timestamp is a one-up counter based off a device’s system clock that was introduced 
in RFC 1323 as a means of accurately measuring the round-trip time (RTT) between two 
devices. The need for accurately measuring the RTT of a packet is to provide a basis for 
determining the retransmission timeout interval (RTO) for lost or unacknowledged 
packets [16]. 


TCP Header 


Header (20 bytes) 

Options (variable 
length) 

Payload 


Figure 3. TCP Header with the Options Segment. Source: [15]. 


The TCP timestamp value is determined by a virtual “timestamp clock” that is 
based on the frequency of operation of the device’s system clock. By observing the 
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values of TCP timestamps, one ean observe the operation of the system cloek [5]. The 
TCP timestamp is a seeond-order effeet of the system cloek and is the means in which the 
clock skew is calculated in this research. 

2. Clock Skew 

The clock skew of a device is the difference in the operating frequencies of its 
system clock relative to the clock frequency of another device [5]. It is this parameter that 
can be used to identify the device based solely through passively observing network 
traffic. When using clock skew as a unique identifier, the identifier is valid only in 
relation to a designated device. This device is known as the fingerprinter [5]. In a SDN, 
the controller can be designated as the fingerprinter due to its ability to monitor all 
devices connected to the network. 

D. CONFIDENCE INTERVALS 

Confidence intervals are used in this research to bound the uncertainty of the 
calculated clock skews due to the randomness of the data collected and because the true 
mean value of the clock skew p, cannot be exactly measured or known. The clock skew is 
a random variable a that is assumed to be Gaussian with a density function f{a). A 
confidence interval provides a range of values in which the true calculated mean value 
lies with a specified probability l-£ [17]. 

The confidence interval is defined as the range of Cl to Cu such that 

P[Q<z<CJ = C, (1) 

where C is the desired confidence probability between zero and one for a given parameter 
z [17]. The value C=l-s, where s is the acceptable error. The bounds of a confidence 
interval for the density function/(a) with an accepted error £ and true mean p are shown 
in Figure 4. The bounds of this confidence interval Cl and C^are determined by solving 
[18] 

00 

( 2 ) 

^ Cu 


1 



and 


|=j/„(a)Ja. (3) 



Figure 4. The Bounds Cl and Cj/of the Confidence Intervals of a Given Density 
Function with a True Mean [x and an Acceptable Error e. Source: [18]. 

Confidence intervals are used in hypothesis testing to decide between two 
possible scenarios. If a hypothesis Ho is made about a parameter and that parameter falls 
within the range of a confidence interval, then that hypothesis is accepted with a 
confidence level of C [17]. This idea is used in this thesis to analyze the clock skews of 
the devices on the network to determine if they originate from the same device. 

The possible threats to a network from devices known as multi-homed hosts, 
devices with more than one interface connected to a network was introduced in this 
chapter. We then described the functionality of a SDN and the advantages of such a 
network as compared to traditional routing procedures. Finally, the system clock of a 
device was described in detail along with how to measure its unique operating parameters 
and use that information as a unique identifier for the device. This information is utilized 
in Chapter III and demonstrated in Chapter IV in a scheme for detecting a multi-homed 
device active on a SDN. 
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III. MULTI-HOMED DEVICE DETECTION SCHEME USING 

CLOCK SKEW 

The problem we present in this research how is to detect a host on a network 
using multiple connections through multiple NICs. In order for a solution to be achieved, 
we must determine whether or not the traffic between different IP addresses can be 
correlated in order to determine if those IP addresses belong to a multi-homed host. Two 
assumptions are made in developing the proposed scheme. The first is that passive means 
of collection are used over the network. The second is that the observer can observe and 
collect traffic from all IP addresses of a multi-homed host. 

The rest of this chapter is organized as follows. First, the proposed scheme for 
detecting multi-homed hosts is presented based on a host’s clock skew. Then, we discuss 
the network configuration and the method of generating and collecting traffic. Finally, 
TCP timestamps are described, and the method of calculating the clock skew of a host is 
presented. 

A. PROPOSED SCHEME 

The proposed solution is to collect TCP timestamp data from a host in order to 
calculate its clock skew for use as a fingerprint. The clock skew of a host is unique and 
has very little variation over time. It has been demonstrated that the clock skew of a host 
stays relatively constant even if two interfaces (Ethernet and Wi-Fi) are used to connect 
to a network. For these reasons, the clock skew can be used as an identifier for a given 
host [5], [6], [19]. The aim of this thesis is to determine whether multiple IP addresses 
with similarly calculated clock skews are from the same device. 

The first step in the proposed process is to monitor and collect traffic across the 
network. The traffic of interest is the TCP segments exchanged between hosts, 
specifically those containing TCP timestamps. From this information, the clock skew of 
each host relative to a central host (the fingerprinter) can be calculated. After the clock 
skews of each host on the network are determined, analysis is conducted based on 
hypothesis testing using confidence intervals to identify potential multi-homed hosts. A 
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testbed network with a fingerprinter is shown in Figure 5, and the process for detecting a 
multi-homed host is outlined in Figure 6. 



Figure 5. Generic Network Configuration of a SDN with a Controller, Two 

Switches, and n Number of Hosts with One Acting as the Fingerprinter 
for Testing and Another Multi-Homed 



Figure 6. Process of Detecting Multi-Homed Devices Using Clock Skew 


In previous work, this method was utilized for the determination of the number of 
hosts behind a NAT. It was suggested in [5] and shown in [7] that one could determine 
the number of hosts sending traffic through a NAT by calculating and comparing the 
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unique clock skews encountered. In this thesis, we propose the use of correlating clock 
skews between multiple IP addresses to determine if they are originating from the same, 
multi-homed device. 

B. NETWORK CONFIGURATION 

To test our proposed scheme, we collect and analyze traffic from hosts on a 
network. A version of the network layout is shown in Figure 7. Multiple hosts are 
connected to each switch with one host among them being multi-homed. The multi¬ 
homed host uses separate Ethernet connections to connect to the network. A central host 
acts as the fingerprinter for determining the clock skews of all hosts on the network [5]. 
The fingerprinter is chosen so that it has the ability to observe traffic from both 
connections of the multi-homed host. 



Figure 7. Configuration of a SDN with a Multi-Homed Host Connected to a 

Single Switch 


C. CLOCK SKEW 

In order to test the proposed scheme, network traffic containing TCP segments 
with timestamps was collected. Using this data, the clock skew of each host can be 
calculated. 
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1. Traffic Monitoring 

The fingerprinter monitors the network from a centralized location for TCP 
segments in the network traffic. Not all of these TCP segments seen in the network traffic 
will contain timestamps. It is the segments with TCP timestamps originating from a host 
and sent to the fingerprinter that are of interest. These segments are aggregated, and the 
TCP timestamps are used in the calculation of the clock skew. These TCP timestamps are 
collected along with the time of collection based on the fingerprinter’s own clock. The 
calculation for the clock skew based on this data is discussed in more detail later in this 
chapter. 

2. TCP Timestamps 

TCP timestamps were introduced as a means to provide a simple and accurate tool 
to measure the RTT of a packet transmission [16]. TCP is meant to be a reliable 
connection-oriented protocol, and this reliable connection is achieved by the 
retransmission of lost or dropped packets. The duration of time before retransmissions are 
sent is known as the RTO and is calculated by knowing the RTT of a packet. TCP 
timestamps provide a simple and accurate means of determining this RTT by sending and 
echoing relative timing information within the TCP packet [16]. 

The timestamp is included in the TCP options portion of the header and consists 
of 10 bytes of data. The format of these 10 bytes of data is shown in Figure 8. The first 
byte is the kind of timestamp, the second byte is the length of the option field, the next 
four bytes contain the current value of the sender’s timestamp, and the final four bytes are 
an echo of the timestamp received [16]. 


Kind 


10 


TS Value (TSval) 


TS Echo Reply (TSecr) 


1 byte 1 byte 4 bytes 


4 bytes 


Figure 8. TCP Timestamp Options Field. Source: [16]. 


The value of the timestamp comes from a virtual internal clock that is known as 
the “timestamp clock’’ and is based upon the device’s own clock [16]. TCP timestamps 
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are a second-order effect of the host’s system clock, and through their collection and 
measurement, the operation of a host’s system clock can be observed [5]. 


3. Clock Skew 


The clock skew is a physical trait of a host’s processor caused by the different 
operating frequencies of crystal oscillators within electronic clocks. The discrepancy in 
operating frequencies is a product of the manufacturing process and results in small 
differences in clock speed of each clock [13], [19]. This difference in frequencies 
between the system clocks of separate devices is calculated as the first derivative of a 
function that includes the offset of their observed times [5], [6], [19]. 

Once the TCP timestamps have been collected, the clock skew can be calculated 
based on the procedure provided in [5]. The first step is to determine the time and TCP 
timestamp offsets of a collected packet versus the initial time of collection. The first 
packet collected by the fingerprinter from a host is used as the baseline for the offset. The 
time offset is given by [5] 


x,=t.-t„ (4) 

where Xi is the difference between the time of collection of the packet at time ti and the 
initial time of collection ti. The timestamp offset w, for the packet is given by 


T,-Z 


W: = ■ 


f 


(5) 


where Ti is the timestamp of the packet, T; is the timestamp of the first packet at the 
initial time of collection and/is the operating frequency of the host’s clock. 


Once the time and timestamp offsets are known, the difference y, between the 
observed time at the fingerprinter and the observed time from the source host based on its 
timestamps is calculated as 


y,. =w,.-x,., 


( 6 ) 


Given the set of points x and y for the data collected, the set of offset values Oj 
for N collected packets is represented as 
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(V) 


and we model the data as a slope-intercept line equation. 


The clock skew is the first derivative (or slope) a of this line 

a-x,^l5>y,, ( 8 ) 

with a ^-intercept of (B that fits the upper bound of the set of points Ot- The solution to ( 8 ) 
is obtained using a linear programming technique with the goal to minimize the objective 
function J 

for N packets [5]. This procedure is repeated for each host on the network. 


D. DETECTION OF MULTI-HOMED HOSTS 

Once the clock skews have been calculated, a comparison must be made in order 
to determine which IP addresses represent the potential multi-homed host in the network. 
To improve accuracy, a large number of trials are required. Based on the central limit 
theorem, the sample mean of independent random variables approaches a Gaussian 
distribution [20]. Consequently, given a relatively large number of trials, we assume that 
the clock skews calculated for each host over these trials approaches a Gaussian 
distribution. 


After the mean clock skew is determined for a host, analysis is done using the 
confidence intervals for the clock skew of all hosts and hypothesis testing to determine 
whether the IP addresses belong to a multi-homed host. 

The sample mean m,of the clock skew for the host is determined as 


m. 


1 N 

i =- 


( 10 ) 


where Ui is the calculated clock skew for the host [18]. Now we formulate the 
following hypothesis. The first hypothesis Ho states that m, for the 7 * host’s clock skew is 
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within the range of the f' host’s eonfidenee interval. The seeond hypothesis Hi states that 
nij for the host’s eloek skew is outside of this range. The lower bound of the 
eonfidenee interval for a host i is represented by Cl,;, and the upper bound is represented 
by the value Cuj- If niy falls within the eonfidenee interval 


Clj ^ < Cjj . 


( 11 ) 


when then hypothesis Ho is aeeepted and the IP addresses are flagged as originating 
from the same host. If not, then hypothesis Hi is aeeepted and the IP addresses did not 
originate from the same host [20]. The proeess for this analysis is shown in Figure 9. 



Figure 9. Proeess of Testing Hypotheses Using Confidenee Intervals to 
Determine If Hosts Are Multi-Homed 

In this ehapter, a seheme for deteeting a multi-homed host aetive on a SDN using 
information from TCP traffie on the network was introdueed. From the observed TCP 
timestamps, the eloek skew between an aetive deviee’s system eloek and the system 
eloek of a designated fmgerprinter ean be ealeulated. This is a unique value for a deviee 
and ean be used as an identifier for that deviee. The validity of the seheme is tested in 
Chapter IV using a SDN test bed. 
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IV. TESTING AND RESULTS 


A scheme for ealeulating eloek skew based on TCP traffie was proposed in 
Chapter III. This seheme was validated using a SDN test bed for data eolleetion. The 
eonfiguration of the network used and the means for generating and eapturing the test 
traffie is deseribed in this ehapter. We then ealeulate the eloek skew of eaeh host and 
apply the eonfidenee interval analysis on the eloek skew of eaeh host to identify the 
multi-homed host. 

A. NETWORK CONFIGURATION 

A portion of the SDN test bed that was built for testing in [21] was used in this 
experiment and eonsisted of two HP switehes and seven Raspberry Pis as hosts. The 
switehes used were the HP 2920 and the HP 3800, and the Raspberry Pis were eonneeted 
to the network using their built-in 10/100 Mbps Ethernet eonnection. The network 
eonfiguration that was used is shown in Figure 10. 



Figure 10. Network Configuration Used in Testing 
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One of the Raspberry Pis had an added USB 2.0 Gigabit LAN adapter that was 
used as its seeond connection to the network. The connections for this Raspberry Pi are 
shown in Figure 11. This was the dual-homed device used in testing and the host that was 
to be experimentally identified. This host used the IP addresses 10.10.13.89 and 
10.10.13.100. Both connections from this host were connected to the TIP 2920 switch. 



Figure 11. Dual-Homed Raspberry Pi Used in Testing 


Also connected to the network was a Dell T1600 running Ubuntu that was acting 
as the DHCP server for the network. The DHCP server was used as the fingerprinter in 
this experiment and was chosen due to the fact that it maintained a static IP address of 
10.10.13.1 throughout testing. 

B. TRAFFIC GENERATION AND COLLECTION 

In order to establish the necessary TCP connections for the purpose of creating 

TCP timestamps, traffic was generated by creating an Secure Shell (SSH) connection 

between the fingerprinter and the hosts on the network. This SSH connection allowed for 

the required TCP handshakes to be made and timestamps to be exchanged between the 

host and the fingerprinter for collection. Packets with TCP timestamps that were 

originating from a host were collected using Wireshark. An example from Wireshark of 
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the initiation of the TCP eonneetion is shown in Figure 12. The internals of the paeket are 
depicted in Figure 13 with the TCP Options segment highlighted to show the timestamp 
value (TSVal) of the packet from 10.10.13.1 and the timestamp echo reply (TSecr) of the 
timestamp in the last packet from 10.10.13.6. 



> Fthcirift IT. ^rc Ot-II 6« .t? (d4 br >14 6« .i?). 0%l R.i- 4 )brir .i4 7t «.i (bS ?7 I'b j4 7A 

b Intctnvl Protocol Vorvion 4. Src: 1» !• 13 1 <19 1» 13 1). Obi: 1» 19 13 « <19 19 13 6) 
b Transmission Control Protocol. Src Port; 39699 (39698). Ost Port 22 (22). S«q: 3769. Ack: 9337. (.on: 9 


Figure 12. TCP Connection Being Made Between Fingerprinter and Flost 


> 


Ethernet II. 

Src: Dell_8e:60: 

a2 (d4:be:d9:8e:60: a2), Dst: Raspberr_a4:78:0a 

(b8:27;eb:a4: 

78:0a) 

Internet Protocol Version 4, 

Src: 10.10.13.1 (10.10,13.1). Dst: 10,10.13,6 

(10.10.13,6) 


Transmi ssion 

Control Protocol 

. Src Port: 39698 (39698), Dst Port: 22 (22), 

Seq: 3789, Ack 

: 9337, Len: 0 


Source Port: 39698 (39698) 

Destination Port: 22 (22) 

[Stream index: 2] 

[TCP Segment Len: 0] 

Sequence number: 3789 (relative sequence number) 

Acknowledgment number: 9337 (relative ack number) 

Header Length: 32 bytes 

> 0600 0001 0000 = Flags; 0x010 (ACK) 

Window size value: 305 

[Calculated window size: 39040] 

[Window size scaling factor: 128] 

> Checksum: 0x0d66 [validation disabled] 

Urgent pointer: 6 

I ^ Options: ] l2 bytes). No-Operation (NOP), No-Operation (NOP), Timestamps 
^"^^T^Ope^tion (NOP) 

> No-Operation (NOP) 

I > Timestamps: TSval 950958840. TSecr 1433177783 | 

> [SEQ/ACK analysis] 


Figure 13. TCP Portion of Packet Showing Timestamp Information 


C. CLOCK SKEW CALCULATION AND RESULTS 

Given the test traffic collected by Wireshark, the next step was to calculate the 
clock skew of each host. One hundred samples of data were collected at ten minute 
intervals, and MATLAB was used for calculations. Using the MATLAB function 
linprog, we solved (9) from Chapter 111 for each host. The solution provided the values of 
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a and P, which are the slope and 3 ^-intercept of the solution to ( 8 ). The value of a 
corresponds to the clock skew and is the value of concern in this scenario. 

The clock skew for each host was calculated independently for each trial using the 
MATLAB code in the Appendix. The upper-bound solution, which was used because the 
delays found within a network between hosts are all positive, for the set of points Or was 
solved for each host [5]. As shown in Figure 14, the solution for the set of data points in 
red corresponding to host 10.10.13.100 provides a slope of 0.0000101203 or 10.1203 
ppm for the line in blue representing the upper bound of the data set. This slope is the 
clock skew for this host when compared to the clock of the fingerprinter, 10.10.13.1. 



Figure 14. Upper-Bound Solution for Host 10.10.13.100 over a Single Trial 

Comparing the slopes for the upper-bound solution of the data sets of all hosts 
over a single trial shows the variation of the clock skews found in this network. As seen 
in Figure 15, there is a range of positive and negative values for the clock skew 
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corresponding to a host’s clock being ahead of or behind the clock of the fingerprinter. 
The hosts using the IP addresses of 10.10.13.89 and 10.10.13.100 both have solutions 
with similar slopes and stand out as possibly being multi-homed due to the fact that the 
solution to (8) for each host appears to be represented by two parallel lines. 



Figure 15. Upper Bound Solution of All Hosts over a Single Trial 

The data in Figure 15 is supported by further trials. The mean value for each clock 
skew after 100 trials is depicted in Table 1. This data shows that the clock skews for 
10.10.13.89 and 10.10.13.100 are similar. When compared to the differences between 
clock skews of the other hosts tested, as shown in Table 2, the difference between 
10.10.13.89 and 10.10.13.100 appears to be negligible. 
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Table 1. Mean Cloek Skew of All Hosts over 100 Trials (in ppm) 


Host 

Clock Skew (ppm) 

10.10.13.6 

17.126 

10.10.13.32 

-1.953 

10.10.13.33. 

-6.405 

10.10.13.35 

-7.313 

10.10.13.37 

6.700 

10.10.13.89 

10.132 

10.10.13.91 

13.020 

10.10.13.100 

10.140 


Table 2. Differenee of Cloek Skew Between All Hosts (in ppm) 


Host 

10 . 10 . 13.6 

10 . 10 . 13.32 

10 . 10 . 13 . 33 . 

10 . 10 . 13.35 

10 . 10 . 13.37 

10 . 10 . 13.89 

10 . 10 . 13.91 

10 . 10 . 13.100 

10 . 10 . 13.6 

0.000 

19.078 

23.531 

24.439 

10.426 

6.994 

4.106 

6.986 

10 . 10 . 13.32 

19.078 

0.000 

4.453 

5.360 

8.653 

12.084 

14.972 

12.092 

10 . 10 . 13 . 33 . 

23.531 

4.453 

0.000 

0.908 

13.105 

16.537 

19.425 

16.545 

10 . 10 . 13.35 

24.439 

5.360 

0.908 

0.000 

14.013 

17.445 

20.332 

17.452 

10 . 10 . 13.37 

10.426 

8.653 

13.105 

14.013 

0.000 

3.432 

6.320 

3.440 

10 . 10 . 13.89 

6.994 

12.084 

16.537 

17.445 

3.432 

0.000 

2.888 

0.008 

10 . 10 . 13.91 

4.106 

14.972 

19.425 

20.332 

6.320 

2.888 

0.000 

2.880 

10 . 10 . 13.100 

6.986 

12.092 

16.545 

17.452 

3.440 

0.008 

2.880 

0.000 


For these eomparisons and for the ealeulation of the eonfidenee intervals, the data 
was assumed to approaeh a Gaussian distribution after the 100 trials. As shown in Figure 
16, the range of eloek skews eollected for host 10.10.13.6 over these trials approaehes a 
normal distribution. 



16.7 16.8 16.9 17 17.1 17.2 17.3 17.4 17.5 17.6 

Clock Skew (ppm) 


Figure 16. Histogram for the Caleulated Cloek Skews of Host 10.10.13.6 Over 
100 Trials as they Approaeh a Gaussian Distribution 
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A 95% confidence interval for the clock skew of each host was calculated over 
the 100 trials conducted. The confidence interval was solved using the paramci function 
within MATLAB. The results for the confidence intervals are shown in Figure 17. In 
Figure 17 the value of the cloek skew for each host is shown as a bar graph in blue. The 
error bar in red covers the range of values from the lower to the upper bounds of the 
confidence interval. The confidence interval for each clock skew is quite small, whieh 
suggests that the clock skew varies only slightly over time; this result has been observed 
in previous work [5], [6]. 
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Host IP 


Figure 17. Confidence Interval of 95% for the Clock Skew of All Hosts over 100 

Trials 

D. DETECTION OF THE DUAL-HOMED HOST 

As deseribed in Chapter III, analysis of the confidence intervals of the elock skew 
for each host was used to determine which hosts were possibly multi-homed. Using the 
confidence intervals as presented in Figure 17, we applied the ideas presented in Chapter 
III to the given data. When the mean clock skew of each host is compared to the 
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confidence interval calculated for all other hosts, the possible dual-homed host can be 
identified. The upper and lower bounds for the confidence interval for the clock skews of 
all hosts are shown in Table 3 along with the mean value of the clock skews calculated 
over 100 trials. 


Table 3. Upper and Lower Bounds of the 95% Confidence Interval of Each 

Host’s Clock Skew 



Host 

10.10.13.6 

10.10.13.32 

10.10.13.33. 

10.10.13.35 

10.10.13.37 

10.10.13.89 

10.10.13.91 

10.10.13.100 

Upper Bound Cl 

17.147 

-1.860 

-6.276 

-7.184 

6.757 

10.171 

13.085 

10.176 

Mean Value 

17.126 

-1.953 

-6.405 

-7.313 

6.700 

10.132 

13.020 

10.140 

Lower Bound Cl 

17.104 

-2.045 

-6.534 

-7.441 

6.643 

10.093 

12.955 

10.104 


When the mean value of each calculated clock skew is compared to the 
confidence interval of the clock skew for each host, it is observed that the possible dual- 
homed hosts are 10.10.13.89 and 10.10.13.100. The confidence intervals for all hosts are 
shown in Figures 18-25. The clock skews for the hosts 10.10.13.6, 10.10.13.32, 
10.10.13.33, 10.10.13.35, 10.10.13.37 and 10.10.13.91 are shown in Figures 18-22 and 
Figure 24, respectively; the confidence intervals of the designated hosts are in blue while 
the values for clock skews for all hosts on the network in red. As can be seen in these 
figures, the confidence interval for a given host only includes the value of its own clock 
skew. In Figure 23 and Figure 25, the hosts represented by the IP addresses of 
10.10.13.89 and 10.10.13.100 fall within each other’s confidence interval, while the other 
hosts remain outside of these bounds. After comparing the data in Table 3 to Figure 23 
and Figure 25, these results confirm the initial network setup where the hosts represented 
by the IP addresses 10.10.13.89 and 10.10.13.100 were from the same Raspberry Pi. 
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Figure 18. Confidence Interval of 10.10.13.6 Compared to the Mean Value of All 

Clock Skews Calculated 



Host IP 


Figure 19. Confidence Interval of 10.10.13.32 Compared to the Mean Value of 

All Clock Skews Calculated 
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Figure 20. Confidence Interval of 10.10.13.33 Compared to the Mean Value of 

All Clock Skews Calculated 



Host IP 


Figure 21. Confidence Interval of 10.10.13.35 Compared to the Mean Value of 

All Clock Skews Calculated 
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Figure 22. Confidence Interval of 10.10.13.37 Compared to the Mean Value of 

All Clock Skews Calculated 



Host IP 


Figure 23. Confidence Interval of 10.10.13.89 Compared to the Mean Value of 

All Clock Skews Calculated 
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Figure 24. Confidence Interval of 10.10.13.91 Compared to the Mean Value of 

All Clock Skews Calculated 



Host IP 


Figure 25. Confidence Interval of 10.10.13.100 Compared to the Mean Value of 

All Clock Skews Calculated 
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E. 


VALIDATING DETECTION SCHEME WITH A SECOND DUAL- 
HOMED HOST 


The results from this testing were validated by moving the dual-homed 
eonnection to another device and repeating the proposed detection scheme. The USB 2.0 
Gigabit LAN adapter was removed from the host using the IP addresses 10.10.13.89 and 
10.10.13.100 to the host that was previously using the IP address 10.10.13.6. This device 
was now the dual-homed device and was also using the IP address of 10.10.13.89. After 
generating traffic as in the previous experiment and calculating the clock skews, we 
determined that the dual-homed connection could still be detected. As shown in Figure 
26, the upper bound solutions to (8) for the hosts 10.10.13.89 and 10.10.13.100 are no 
longer parallel. Instead, the parallel solution has shifted to 10.10.13.6 and 10.10.13.89. 
This supports the change in network configuration. 



Figure 26. Upper Bound Solution of All Hosts over a Single Trial after the 

Second Connection was Shifted to the Host with IP address 10.10.13.6 
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When the confidence intervals of the mean clock skews were compared after 30 
trials, the detection scheme was correctly able to identify the dual-homed host as 
10.10.13.6 and 10.10.13.89. In Figure 27 the confidence interval for 10.10.13.6 is shown 
in relation to the clock skew of hosts on the network. This confidence interval contains 
the clock skews for 10.10.13.6 and 10.10.13.89. This same outcome is shown in Figure 
28 where the confidence interval for 10.10.13.89 contains the clock skews for 10.10.13.6 
and 10.10.13.89. These results confirm the change in network configuration. 



Figure 27. Confidence Interval of 10.10.13.6 After Shifting the Dual Connection 
Compared to the Mean Value of All Clock Skews Calculated 
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Figure 28. Confidence Interval of 10.10.13.89 After Shifting the Dual Connection 
Compared to the Mean Value of All Clock Skews Calculated 


F. VALIDATING DETECTION SCHEME WITH A MULTI-HOMED HOST 

The final validation of the proposed scheme was to add a host with three 
interfaces to the network and attempt its detection. A Raspberry Pi was connected to the 
network using its standard built in Ethernet connection as well as with two USB to 
Ethernet adapters. These interfaces were assigned with the IP addresses of 10.10.13.89, 
10.10.13.91, and 10.10.13.100. As in the previous sections, the clock skew for all hosts 
on the network were calculated, and the proposed scheme was used to correlate any 
possible multi-home connections. As seen in Figure 29, there are now three parallel lines 
for the solutions to (8), suggesting that these IP addresses are from the multi-homed host. 
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Figure 29. Upper Bound Solution of All Hosts over a Single Trial after the Three 
Connections Were Made to the Network from One Host 

This is confirmed when their mean values are compared to each other’s 
confidence intervals as was done in previous sections. In Figure 30 the confidence 
interval for 10.10.13.89 is shown to contain the value of the clock skews for 10.10.13.89, 
10.10.13.91, and 10.10.13.100. This result is repeated for the confidence interval of 
10.10.13.91 in Figure 31 and the confidence interval for 10.10.13.100 in Figure 32. These 
results confirm the change in network configuration where all three IP addresses are 
originating from the same device. 
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Figure 30. Confidence Interval of 10.10.13.89 When Three Connections Are 

Made to the Network from One Host 



Figure 31. Confidence Interval of 10.10.13.91 When Three Connections Are 

Made to the Network from One Host 
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Figure 32. Confidence Interval of 10.10.13.100 When Three Connections Are 

Made to the Network from One Host 


The testing and analysis presented in this chapter demonstrated that clock skew 
information can be used to identify traffic from different IP addresses that represent the 
same, multi-homed host. This testing was successfully validated by shifting the multi¬ 
homed connection between devices and executing the same methods of detection. 
Finally, it was shown that the proposed scheme can be used to detect a device using three 
separate interfaces to connect to the network. 
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V. CONCLUSION 


The idea of using cloek skews to remotely identify a deviee was presented in [5] 
and further tested in [19]. Continuing this work, we determined that the eloek skew of a 
device can be detected independently of the interface used by that device to connect to 
the network [6]. This idea was used in previous work for enumeration behind a NAT [7] 
and was explored in this thesis as a means of detecting a multi-homed host using multiple 
interfaces. 

The motivation for this work was to improve the security of a network and the 
integrity of its firewall. Since a multi-homed host can be used to bypass a network’s 
firewall and connect directly to the Internet, it is important to be able to detect the 
presence of such devices. A scheme to use the clock skew of a device as an identifier that 
is independent of the interface the device used to connect to the network was developed 
and tested in this work. Since the clock skew of a host stays relatively constant over time 
[5] and is independent of the interface used [6], it was proposed that this can be used to 
correlate traffic that appears to be coming from different source IP addresses as traffic 
from the same host. 

A. SIGNIFICANT RESULTS 

The proposed detection scheme used network traffic and system clock data in 
order to identify possible multi-homed hosts on a network. The concept of using clock 
skew as a unique identifier for a host has been suggested and tested in literature, but this 
idea has not been utilized in attempting to detect a host on a network using multiple 
interfaces. These concepts and methods were used to create a model to detect a multi¬ 
homed host from a designated fingerprinter. This information can then be used by the 
controller in a SDN to create new flow rules and isolate a possible multi-homed host for 
further investigation and to mitigate security risks. 

The ability for a designated host to act as a fingerprinter and determine the clock 
skews of each host on its subnet based on information from its own internal clock and 
TCP timestamp information was demonstrated in this research. Based on this 
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information, it was shown using analyses of the eonfidenee intervals of a deviee’s eloek 
skew eompared to the ealeulated mean eloek skew of all other deviees on the network 
that the traffie from IP addresses that originated from the same host ean be eorrelated to 
one another. 

The deteetion seheme was then repeated after shifting the dual-homed eonneetion 
to another deviee and suceessfully identifying that host as dual-homed. Finally, it was 
shown that it was possible to use this seheme to deteet a deviee on the network using 
three distinet interfaees. 

B. RECOMMENDATIONS AND FUTURE WORK 

The eoneept of using eloek skews to identify traffie from multiple IP addresses 
that originated from the same host was presented in Chapter III and tested and validated 
in Chapter IV. Another means of ealeulating eloek skew is from timestamp data in ICMP 
paekets [5], [22]. This was not tested in this thesis and is another possible means of 
deteetion that ean be further explored for validation of these results or to improve the 
granularity of deteetion. 

The proposed seheme was implemented with the lingerprinter that was on the 
same subnet as all other hosts. What was not shown in this researeh was the 
implementation of this proeess from a panoptie or comprehensive viewpoint such as an 
SDN controller. It was not demonstrated that the switches used in this network were 
capable of forwarding OpenFlow packets with TCP header information from the data 
plane where the hosts and lingerprinter reside to the control plane. A future effort could 
focus on implementing this scheme in the control plane and designating the controller as 
the lingerprinter for the entire network. This will provide a means of monitoring for and 
reacting to the existence of a multi-homed host from a centralized location. 

The proposed scheme was tested using seven Raspberry Pis, with one being 
multi-homed. The next step is to increase the number of hosts on the network for a larger 
sample size. While increasing the sample size, variety in the types of host used can be 
introduced. Since all the hosts used in this thesis were of the same type, there was no 
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variation in operating system, motherboard, or network driver. Introdueing variety in 
deviees on the network will further validate the proposed seheme. 

In this thesis, testing was done using one SDN test bed with the assumption that 
the fingerprinter eould see all traffie on the network. The next step is to deteet a multi¬ 
homed host that is eonneeted to multiple networks that are separated by a firewall. 
Deteeting the presenee of this deviee would aehieve the end goal of identifying a deviee 
on a SDN that presents a threat through its ability to bypass the network’s seeurity. 
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APPENDIX. MATLAB CODE EOR CALCULATING CLOCK SKEW 


%% Load the data 

Data=xlsread( 'test_21April.xls' ); 
hosts=[4, 5, 6, 8, 9, 43, 89, 100]; 

%IP addresses observed, 10.10.13.* 

L=length(hosts); 

%% Calculate the Clock Skew and plot the data 
for n=l:L 

[row,~]=find(Data==hosts(n)); 

%Extract data for a given IP address 
Hosts=Data(row,:); 

%Create a matrix for that data 

for k=l:length(Hosts) 

x(k) = Hosts (k,3) -Hosts(l,3); 

%calculate the time offset 
v(k) = Hosts(k,2) -Hosts(1,2); 

%calculate the timestamp offset 
end 

b=ones(length (x),1); 
a=[x' b] ; 

f= [sum(x)/length(x) 1]; 

I = linprog(f, -a, -v); 

%solving the linear programing solution 
%for Hz 

for k=l:length(Hosts) 
w(k) = V(k)/round(I(1) ) ; 

%adjusting v based on Hz 
%the difference between observed and 
%actual time 
y (k) = w (k) - X (k) ; 

end 

z=linprog(f, -a, -y); 

%linear programming solution for which 

%provides the slope of 0, which is the 

%clock skew 

Z(n)=z (1) ; 

figure 

hold on 

plot(x,y, 'r.' ) 

%plotting the upper bound limit of 0 
h=refline (z (1),z (2)); 
get (h, 'linewidth' ); 
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set (h, 'linewidth' , 2.5); 
title ([ 'Clock Skew for host 10.10.13. 
'] ) 

xlabel( 'Time offset (seconds )' ) 
ylabel( 'Timestamp offset (seconds)' ) 
clear Hosts xvwybafi z 
end 

format long 

fprintf ( ' Host Clock Skew \n' ) 
fprintf ( '%10.6f %15.6f\n' , [hosts' Z' 


num2str(hosts(n)) ' 
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